-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Speed up DatetimeConverter by using Matplotlib's epoch2num when possible... #6650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Speed up DatetimeConverter by using Matplotlib's epoch2num when possible... #6650
Conversation
def _check_equal(self, ts1, ts2 = None): | ||
from pandas.tseries.converter import DatetimeConverter | ||
from matplotlib import dates | ||
from numpy.testing import assert_almost_equal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy's assert_almost_equal is necessary, since pandas' variant does not seem to allow to specify the required significance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe let's modify pandas version to pass thru the significance chrck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cython function decimal_almost_equal (https://github.com/pydata/pandas/blob/master/pandas/src/testing.pyx#L37) seems to do what we need, but I don't know how/where it is exposed.
Edit: Sorry, decimal_almost_equal implements just the scalar variant. The problem with pandas' assert_almost_equal is not only that it hardcodes the number of decimal, but also that the meaning of decimal is different. Besides numpy.testing.assert_almost_equal, also numpy.allclose could be an option here.
Will do. A quick question though: I just noticed there are already tests in pandas/tseries/test/test_converter.py. Would that be the correct place to add them instead, rather than pandas/tests/test_tseries.py? |
ok to put then in the converter test suite |
Maybe also add a vbench speed benchmark? |
I created a simple vbench benchmark for datetimeindex conversion in vb_suite/timeseries.py. Before I commit though, 2 small questions:
|
you can use 1/1/13 is fine |
from pandas.tseries.converter import DatetimeConverter | ||
""" | ||
|
||
datetimeindex_converter = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to define rng
in the setup. pls post a run of this vbench as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, rng
is already defined in common_setup; it is best practice to redefine it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry....you are right..ok then
vbench run:
|
@TomAugspurger ok with this? |
Yep this looks good. Should I merge it? |
@agijsberts just need a release note |
…onding note in release notes..
Latest commit implements comments and adds release notes. Ok like this? |
Sorry, some tests fail in the last commit. If I replace |
the first causes a circular import |
Good to know, thanks! |
@TomAugspurger you have the honors! |
PERF: Speed up DatetimeConverter by using Matplotlib's epoch2num when possible...
Thanks @agijsberts. |
closes #6636
This fixes the performance bottleneck described in #6636 by using the vectorized epoch2num for DataIndex arrays. The included test ensures that the DatetimeConverter produces the same results as the Matplotlib's date2num for individual Timestamp objects as well as DatetimeIndex array objects.